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1 Introduction 

An essential assumption in estimating the source pa¬ 
rameters in electromagnetic source analysis (EMSA) 
is that the number of sources is known. If the incor¬ 
rect number of sources is assumed in the estimation 
procedure, errors could occur [1], Supek and Aine 
[2], for example, showed that if too many dipoles 
are assumed (over-modeling) then ‘spurious’ dipoles 
could arise with high standard errors, and if too few 
dipoles are assumed (under-modeling) estimates can 
be inaccurate but with low standard errors. 

Given a data set a model selection procedure con¬ 
sists of comparing different models to the data with 
a model selection criterion (sometimes called good¬ 
ness of fit function) and then deciding which model 
is best according to the criterion. Several model se¬ 
lection criteria are available for deciding how many 
active regions there are in the electro- or magneto¬ 
encephalogram (EEG/MEG). We consider two types 
of model selection criteria: regression and l ik elihood 
based criteria. 

In the regression approach the parameters are esti¬ 
mated by least squares (LS) and subsequently inserted 
in a function derived from the regression function. 
Likelihood is based on the estimation of parameters 
by maximum likelihood (ML) and then determining 
the number of sources with a likelihood ratio type of 
function. The likelihood case includes the regression 
case with a specified parametric probability distribu¬ 
tion function for the residuals [3]. 

Regression based criteria are commonly defined for 
uncorrelated residuals (noise). However, the averaged 
EEG/MEG noise is correlated (e.g., [4], [5], [6]). To 
account for the noise correlations the LS function is 
modified (see e.g., [7]). Consequently, the regression 
criteria are modified similarly. In the likelihood case 
noise correlations are accounted for. 

Many criteria have been examined for linear models. 
The inverse problem in EMSA is nonlinear, however, 
and some properties of these criteria are not obviously 
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true for nonlinear functions. This paper examines 
with a simulation study to what extent the results of 
criteria on linear models are robust to nonlinear func¬ 
tions. We first derive several cirteria briefly, then we 
compare the model selection procedures with a simu¬ 
lation in the setting of EMSA. 

2 Methods 

2.1 Notation and assumptions 

Let y i be the m-vector with measured MEG for trial 
i 1,2__ n, and let y = y i/n be the aver¬ 

age. The average MEG y is modeled by a biophys¬ 
ical function f ( 6 ) = f (X. 0 ), which is an m-vector. 
Both MEG sensor locations (first and second order) 
and orientations are collected in the m x 9 matrix X, 
and the dipole parameters are collected in the p = 5 d- 
vector 6. Then 

y = f(0) + e, (1) 

where e is the noise. We write e for y — f (6) where 
6 is a LS or ML estimate. The following assumptions 
are made 

A1 The noise £ is independently and identically dis¬ 
tributed. The noise is multivariate normally dis¬ 
tributed with zero mean and covariance S. 

A2 There are no errors in the head model. 

Although it is not required for regression based cri¬ 
teria that the noise is normally distributed, it is so for 
likelihood based criteria, but it is not a strong assump¬ 
tion. The second assumption is to indicate that errors 
in the head model are not considered here. 

2.2 Model selection criteria 

In the regression case the goodness of fit of the 
regression of y on f (6) can be defined by 1 — 
(e'V 'e/y'y), where the residuals are weighted by 
V = S/s 2 andS = £" = i(y*-J)(y 2 1) 
is an estimate of S, and s 2 is the average of the diago¬ 
nal elements of S. The weighting matrix V accounts 
for noise heteroscedasticity and corrrelations but not 
for the average noise level. The percent variance (PV) 




then be defined by 


PV = 100 1 - 


y'y 


( 2 ) 


This is unlike the more common definition of the PV 
(see e.g. [2], [8]). Deviations from 1 (a good fit) 
should indicate that the source model is inadequate. 
A model is selected either by choosing an arbitrary 
value, such as 95%, or by comparing the increase of 
the PV from model to model. 

A different approach is to consider the ratio of two 
different estimates of noise. The first is an estimate 
of the modeling errors e'V x e, and the second is 
an estimate of pure noise variance s 2 = Ya =i (y* — 
y) , (y* — y)/m. If we take V = S/s 2 then we have 
the ratio of the two different estimates of noise vari¬ 
ance e'V 'e/s 2 = e'S 'e, where S is as defined 
above. This ratio is an indication of the closeness of 
the errors due to modeling to pure noise. This ra¬ 
tio leads to Hotelling’s T 2 , which we call lack of fit, 
which is defined by 


LOF = e'S 


(n — l)rr 


( 3 ) 


which is F distributed with rn and n — m degrees of 
freedom. If we find a value of LOF which is larger 
than the criterion value associated with F(m : n — 
m, p), then we reject the model. 

In the likelihood case a criterion is approximated 
by the likelihood function. Akaike’s information 
criterion (AIC) is based on the approximation of 
the so called Kullback-Leibler distance by the nat¬ 
ural logarithm of the likelihood ratio. The likeli¬ 
hood ratio is defined by A = max{L(y; 6) : 6 E 
©o}/ max{L(y; 6 ) : 0 E 0p} where L(y; 6 ) = 
—mn ln(27r)/2 — nln |S|/2 — e'S _1 e/2, and the pa¬ 
rameter space @o is associated with the hypothesized 
model, and &/> is the parameter space associated 
with the saturated model (i.e. including all possible 
parameters P ). The log-likelihood ratio In (A) is in 
general biased. The correction of this bias results in 
ln(A) + 2 p — P. Furthermore, the denominator in A 
can be omited since the saturated model is constant. 
We then obtain the AIC, defined by (see [9] for a clear 
derivation of the AIC). 


AIC = —2L(y; 6) ± 2p. (4) 

A model is selected by the AIC if the value AIC is 
minimal over all models. 

If a Bayesian point of view is taken then models are 
compared with the so called Bayes factor, defined by 


y)fp{0y. y) where 6, represents a model, and 
p(-) is a probability density function. We have as¬ 
sumed that the prior distributions of the models are 
noninformative (that is uniform). If we take the natu¬ 
ral logarithm of the Bayes factor then it is possible to 
approximate this by In (A (i,j)) ± In {m)(pj — Pi)/2, 
where X(i. j) is the likelihood ratio for models 6, and 
Oj. If we take the model of the denominater to be 
the saturated model, then we can omit this. From this 
we have the Bayesian information criterion defined 
by (see [10] for a clear view on the BIC) 

BIC = —2L(y; 6) ± ln(m)p. (5) 

The model with the minimal BIC value over all mod¬ 
els is selected. 

3 Simulation 

All four model selection procedures are compared in 
their accuracy to determine the number of sources in 
simulated MEG data. 

3.1 Forward and inverse computations 

Two dipolar sources were generated in a spherical 
head model with radius R = 10cm. There were 
127 second order axial gradiometers. The orienta¬ 
tions of the sensors were orthogonal to the radius of 
the sphere. The sensors were positioned at 11.932cm 
from the origin of the sphere in the following manner: 
1 sensor at the vertex, the other 126 were positioned 
in rings around the sphere at 15 degrees intervals (un¬ 
til 90 degrees) with 6, 12, 18, 24, 30, and 36 sensors 
on each ring. 

The position of the two sources was determined by 
significance testing. The two sources were just sig¬ 
nificantly (p h 0.05) distinguishable according to the 
statistic 


(d 2 -e 1 )'c- 1 (d 2 -e 1 )/s 2 q , (6) 

where C is the parameter covariance matrix, £ is the 
noise variance, and q is the number of parameters in 
each 6i . This statistic is F(q. m — p*. a) distributed, 
with p* the number of free parameters. In this case 
we obtained F( 8, 111) = 1.96 withp = 0.058. The 
Euclidean position and orientation coordinates of the 
two sources are respectively (1.57, ±0.90, 6.76), and 
(5.86, ±3.38,—1.75). The two sources were sepa¬ 
rated by nearly 2cm. The two dipoles selected with 
this criterion represent a class of two dipolar sources 




which are just separable with an arbitrary noise vari¬ 
ance. 

Homoscedastic but correlated noise was added to 
each sensor. The noise standard deviation was cho¬ 
sen to be 10% of the absolute maximal signal over all 
sensors. Noise correlations were generated with the 
model rij = exp (—dij/a), where dy denotes the dis¬ 
tance between sensors and a determines the strength 
of the correlations. For a = 0.5 the correlations var¬ 
ied between 0.008 and 0.659 with a mean of 0.105. 
400 simulations were generated with each 300 trials. 
For the inverse computations we only estimate source 
parameters, and no noise covariance parameters. 
Consequently, in this case ML and GLS estimates 
are equivalent. An estimate satisfied the condition 
min{e , S _1 e : 6 £ 0}. Source models with 1, 2, and 
3 dipoles were estimated, which covers both under¬ 
and over-modeling. For each estimated set of param¬ 
eters the four criteria were evaluated. 

3.2 Results 

When two sources were estimated the location param¬ 
eters were on average quite good (see Table 1). Esti¬ 
mation of one source led to a location that was in the 
middle of the two true sources. When three sources 
were estimated the location of only one of the sources 
was similar to the left true source. 

Table 1: The true and averaged estimated (1, 2, and 
3) source location parameters _ 


parameter 

true 


2 sources 


left 

right 

left 

right 

X 

1.6 

1.6 

1.28 

1.39 

y 

-0.9 

0.9 

-0.56 

0.2 

z 

6.8 

6.8 

6.46 

5.15 

parameter 

1 source 


3 sources 


X 

0.9 

1.2 

1.59 

0.79 

y 

0.0 

0.55 

-6.88 

0.21 

z 

6.2 

5.85 

6.8 

1.82 


A model selection procedure was classified as good 
if it selected the correct (2 dipole) model. A correct 
decision with the PV is when the PV with 2 sources 
shows an increase of at least 0.15% compared to the 
PV with 1 source and if the PV with 3 sources shows 
an increase less than 0.15%. For the LOF the 1 source 
model should be rejected (i.e. p < 0.05) and the 2 
source model should be accepted (i.e. p > 0.05). For 
both the AIC and BIC a correct decision is defined as 
the minimum value of the AIC or BIC of all evalu¬ 


ated models. The percentages of correct decisions are 
displayed in Fig. 1. 
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Figure 1: Percentages of correct decisions, selecting 
the model with 2 dipoles. 

From these percentages the AIC should be the pre¬ 
ferred model selection procedure. In a few cases the 
model with 3 sources was selected for both the AIC 
and BIC. The PV nearly always selected the model 
with 3 sources and seldomly the model with 1 source. 
The LOF selected mostly the model with 1 source and 
a few times the model with 3 sources. 

4 Discussion 

Both AIC and BIC seem quite robust against the non¬ 
linearity of the MEG function, since the percentages 
are in accordance with theoretical results obtained 
from linear models [11]. 

The PV is inadequate since the PV with 3 sources 
gives the highest PV value. It is irrelevant if the crite¬ 
rion of increase in PV is chosen or an arbitrary value 
like 95%, the PV will indicate that the model with the 
highest PV value is the best one. Moreover, if the PV 
value indicates that a model is bad, then it is not clear 
if this is due to a high noise level or to large modeling 
errors [2], The PV can be improved by also correcting 
the denominator similar to the nominator. 

The poor performance of the LOF was somewhat 
surprising, but can be explained. Hotteling’s T 2 
is designed to compare a vector of means to a hy¬ 
pothesized vector of constants. Instead we have in¬ 
serted an estimate for the hypothesized vector. The 
(co)variance components of the parameter estimates 
are therfore completely ignored. Currently we are 
working on an improvement of the LOF to incorpo- 


rate the (co)variance components of the parameter es¬ 
timates. 
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